Skip to content

Conversation

@hyp3rd
Copy link
Owner

@hyp3rd hyp3rd commented Aug 23, 2025

No description provided.

hyp3rd added 7 commits August 23, 2025 11:55
…l to distributed cache

- Add hinted handoff mechanism to queue writes for temporarily unavailable replicas
  * Queue hints with TTL and replay them when nodes recover
  * Add configuration options for hint TTL, replay interval, and max hints per node
  * Include metrics for queued, replayed, expired, and dropped hints

- Implement parallel reads for improved performance
  * Enable concurrent fan-out to replica nodes for quorum/all consistency
  * Early termination when quorum is satisfied
  * Configurable via WithDistParallelReads option

- Add simple gossip protocol for membership information sharing
  * Periodic random peer selection for gossip exchange
  * Automatic node state synchronization based on incarnation numbers
  * Configurable gossip interval

- Enhance replica failure handling in replicateTo method
- Add comprehensive metrics tracking for new distributed features
- Update cspell configuration to include nosec directive

This significantly improves the resilience and performance of the distributed
cache system by handling temporary node failures gracefully and enabling
more efficient read operations.
Add comprehensive Merkle tree-based anti-entropy mechanism for distributed cache synchronization:

- Implement BuildMerkleTree() to create hash trees from cache data with configurable chunk sizes
- Add SyncWith() method for comparing local/remote trees and pulling newer versions
- Extend DistTransport interface with FetchMerkle() for remote tree retrieval
- Add /internal/merkle HTTP endpoint for tree access over network
- Include merkle sync metrics (operations count and keys pulled)
- Add comprehensive test coverage for sync convergence scenarios
- Support both in-process and HTTP transports (HTTP fetch merkle marked as unsupported)

This enables efficient detection and repair of data inconsistencies between distributed cache nodes by comparing compact tree representations rather than full data sets.
- Add FetchMerkle() and ListKeys() methods to HTTP transport
- Implement periodic auto-sync with configurable intervals and peer limits
- Add /internal/keys endpoint for key enumeration
- Refactor sync logic into modular helper methods
- Add comprehensive HTTP Merkle sync test coverage
- Enhance distributed metrics with auto-sync tracking
- Clean up HTTP transport code and remove redundant comments

This enables full anti-entropy synchronization over HTTP transport
and provides automatic background sync capabilities for distributed
cache consistency.
…semantics

- Add Merkle tree synchronization with timing metrics (build, diff, fetch durations)
- Implement tombstone versioning to prevent key resurrection during anti-entropy
- Add new HTTP endpoints for Merkle tree inspection (/internal/merkle, /internal/keys)
- Introduce configuration options for Merkle chunk size, auto-sync, and key enumeration caps
- Enhance delete operations with versioned tombstones to maintain consistency
- Add comprehensive test suite for Merkle sync edge cases (empty trees, no-diff, single missing keys, tombstone preservation)
- Update documentation with new distributed memory capabilities and configuration options

This enables robust distributed consistency by preventing stale data resurrection
and providing efficient anti-entropy synchronization between cache nodes.
…ogress table

- Add comprehensive tombstone versioning and anti-resurrection guard details
- Document Merkle phase timing metrics and anti-entropy pull counters
- Include roadmap progress table showing current implementation status
- Expand descriptions of delete semantics and remote sync behavior
- Clarify DebugInject tombstone clearing functionality for testing

This update provides much clearer documentation for users and contributors
about the current state of distributed cache features and deletion handling.
- Add configurable tombstone TTL and periodic compaction to reclaim memory
- Implement WithDistTombstoneTTL and WithDistTombstoneSweep options
- Add tombstone metrics tracking (TombstonesActive, TombstonesPurged)
- Enhance quorum reads with targeted stale owner repair
- Refactor consistency logic with collectQuorum helper method
- Update README with new configuration options and metrics
- Add comprehensive test coverage for stale quorum scenarios
- Improve error handling and code formatting in existing tests

This addresses memory management concerns with tombstone accumulation
while improving distributed consistency guarantees through better
read repair mechanisms.
…stale tracking

- Fix variable capture in goroutines for Go <1.22 compatibility
- Add owner tracking to parallel fetch results to enable targeted repairs
- Implement stale owner detection during parallel consensus building
- Add targeted repair mechanism before full replica repair
- Improve code structure and comments for better maintainability

This ensures parallel quorum reads correctly identify and repair stale
replicas while maintaining compatibility with older Go versions that
require explicit variable capture in goroutine closures.
Copilot AI review requested due to automatic review settings August 23, 2025 15:51
@trunk-io
Copy link

trunk-io bot commented Aug 23, 2025

Running Code Quality on PRs by uploading data to Trunk will soon be removed. You can still run checks on your PRs using trunk-action - see the migration guide for more information.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive distributed backend system with Merkle tree-based anti-entropy synchronization. The implementation adds Merkle trees for efficient divergence detection, tombstone-based delete semantics to prevent resurrection of deleted keys, and various supporting features like hinted handoff, gossip, and automatic synchronization.

  • Merkle tree anti-entropy for efficient sync between distributed nodes
  • Tombstone-based delete semantics with version ordering and TTL-based compaction
  • Auto-sync mechanism with configurable intervals and peer limits

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/merkle_sync_test.go Test for Merkle sync convergence between nodes
tests/merkle_single_missing_key_test.go Test for detecting and pulling single remote-only keys
tests/merkle_no_diff_test.go Test for handling identical trees (no-op sync)
tests/merkle_empty_tree_test.go Test for syncing between empty trees
tests/merkle_delete_tombstone_test.go Test for tombstone-based delete semantics
tests/hypercache_http_merkle_test.go HTTP transport Merkle tree operations test
tests/hypercache_distmemory_stale_quorum_test.go Test for quorum reads with stale replica repair
tests/hypercache_distmemory_versioning_test.go Minor spacing adjustment
tests/hypercache_distmemory_remove_readrepair_test.go Minor spacing adjustment
tests/hypercache_distmemory_integration_test.go Minor spacing adjustment
pkg/backend/dist_memory.go Core distributed memory implementation with Merkle trees and tombstones
pkg/backend/dist_http_transport.go HTTP transport with Merkle tree and key listing endpoints
pkg/backend/dist_http_server.go HTTP server endpoints for Merkle and key listing
cspell.config.yaml Spell check configuration updates
README.md Documentation updates for new features
.github/instructions/instructions.md Development guidelines

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +243 to +245
if t == nil {
return nil, errNoTransport
}
Copy link

Copilot AI Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function checks if t == nil but this should never happen in normal Go usage since methods cannot be called on nil receivers without panicking. This check is redundant and might indicate a design issue.

Suggested change
if t == nil {
return nil, errNoTransport
}

Copilot uses AI. Check for mistakes.
@hyp3rd hyp3rd merged commit 1cdbc00 into main Aug 23, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants